SVD
Understading the Geometry of \(A\in \mathbb{R}^{m\times n}\)
\[ A\in \mathbb{R}^{m\times n} \\ x\in \mathbb{R}^n \mapsto Ax \in \mathbb{R}^m \]
What we are formalizing is that there is a major axis and a minor axis to this ellipse and we are stretching in the major and minor axes and rotating the ellipse out of the coordinate plane.
Critical point of the ratio:
\[ R(\mathbf{v}) = \frac{||A\mathbf{v}||_2}{||\mathbf{v}||_2} \]
Since \(R(\alpha \mathbf{v}) = R(\mathbf{v})\), we can assume \(||\mathbf{v}||_2 = 1\).
\[ \max_{\mathbf{v}}||A\mathbf{v}||_2 \quad \text{subject to } ||\mathbf{v}||_2 = 1 \rightarrow A^TA\mathbf{v} = \lambda \mathbf{v}, \lambda \geq 0 \]
According to Spectral Theorem, \(A^TA\) is symmetric and has \(n\) real eigenvalues and \(n\) orthogonal eigenvectors \(\mathbf{v}_1, \dots, \mathbf{v}_n\) which span \(\mathbb{R}^n\).
Singular Value Decomposition
Define \(u_i^* = A\mathbf{v}_i\).
\[ \begin{aligned} &A^TA\mathbf{v}_i = \lambda_i \mathbf{v}_i \\ &\Rightarrow A^Tu_i^* = \lambda_i \mathbf{v}_i \\ &\Rightarrow A A^Tu_i^* = \lambda_i A\mathbf{v}_i = \lambda_i u_i^* \\ \end{aligned} \]
So, \(u_i^*\) is an eigenvector of \(AA^T\) when it is nonzero.
\[ ||u_i^*||_2 = ||A\mathbf{v}_i||_2 = \sqrt{\mathbf{v}_i^T A^TA \mathbf{v}_i} = \sqrt{\mathbf{v}_i^T \lambda_i \mathbf{v}_i} = \sqrt{\lambda_i} \]
Two cases:
- \(\lambda_i \neq 0 \Rightarrow u_i^* = A\mathbf{v}_i\) is an eigenvector of \(AA^T\) with \(||u_i^*||_2 = \sqrt{\lambda_i}\)
- \(\lambda_i = 0, u_i^* = 0\)
Take \(k:= \text{\# nonzero eigenvalues of } A^TA\).
\[ \exists \mathbf{v}_1, \dots, \mathbf{v}_k \in \mathbb{R}^n, u_1, \dots, u_k \in \mathbb{R}^m \text{ with: }\\ \begin{aligned} A^TA\mathbf{v}_i &= \lambda_i \mathbf{v}_i\\ AA^Tu_i &= \lambda_i u_i(u_i = u_i^* / ||u_i^*||_2)\\ A\mathbf{v}_i &= \sqrt{\lambda_i} u_i\\ A^Tu_i &= \sqrt{\lambda_i} \mathbf{v}_i\\ \end{aligned}\\ \]
Observation
\[ u_i^TA\mathbf{v}_j = (A^Tu_i)^T \mathbf{v}_j = \sqrt{\lambda_i} \mathbf{v}_i^T \mathbf{v}_j = \sqrt{\lambda_i} \delta_{ij}\\ \]
\[ \overline{U} = \begin{bmatrix} | & | & & | \\ u_1 & u_2 & \dots & u_k \\ | & | & & | \\ \end{bmatrix} \in \mathbb{R}^{m\times k} \\ \overline{V} = \begin{bmatrix} | & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_k \\ | & | & & | \\ \end{bmatrix} \in \mathbb{R}^{n\times k} \\ \overline{U}^TA\overline{V} = \overline{\Sigma},\\ \overline{\Sigma} = \text{diag}(\sqrt{\lambda_1}, \dots, \sqrt{\lambda_k}) \in \mathbb{R}^{k\times k} \]
Complete \(\overline{U} \rightarrow U \in \mathbb{R}^{m\times m}\) and \(\overline{V} \rightarrow V \in \mathbb{R}^{n\times n}\) by adding nullspace vectors (orthonormal columns).
Define \(\Sigma \in \mathbb{R}^{m\times n}\) s.t.
\[ \Sigma_{ij} = \begin{cases} \sqrt{\lambda_i}=\sigma_i & i = j \leq k \\ 0 & \text{otherwise} \end{cases}\\ A = U\Sigma V^T \\ U^TU = I_{m\times m}, V^TV = I_{n\times n} \] \(\Sigma\) is a diagonal matrix with nonnegative entries.
Geometry of SVD
\[ A = U\Sigma V^T \\ \]
- Rotate (\(V^T\))
- Scale (\(\Sigma\))
- Rotate (\(U\))
SVD Vocabulary
- Left singular vectors: Columns of \(U\); span \(\text{col } A\)
- Right singular vectors: Columns of \(V\); span \(\text{row } A\)
- Singular values: Diagonal \(\sigma_i\) of \(\Sigma\); sort \(\sigma_1 \geq \sigma_2 \geq \dots \geq 0\)
Second Approach to SVD: Algebraic
\[ B:=\begin{bmatrix} 0 & A^T \\ A & 0 \end{bmatrix} \]
Proposition: Take \(\mathbf{x} = (\mathbf{x}_1, \mathbf{x}_2) \in \mathbb{R}^{m+n}\) to be an eigenvector of \(B\) with eigenvalue \(\lambda\), where \(\mathbf{x}_1 \in \mathbb{R}^m, \mathbf{x}_2 \in \mathbb{R}^n\). Then, \(\mathbf{x}':=(\mathbf{x}_1, -\mathbf{x}_2)\) is an eigenvector of \(B\) with eigenvalue \(-\lambda\).
On the board: \(A=U\Sigma V^T\) for diagonal \(\Sigma\)("economy" SVD)
Many SVD's
- Full SVD: \(U \in \mathbb{R}^{m\times m}, \Sigma \in \mathbb{R}^{m\times n}, V \in \mathbb{R}^{n\times n}\)
- Thin SVD: \(U \in \mathbb{R}^{m\times k}, \Sigma \in \mathbb{R}^{k\times k}, V \in \mathbb{R}^{n\times k}\), where \(k = \min\{m, n\}\)
- Compact SVD: \(U \in \mathbb{R}^{m\times k}, \Sigma \in \mathbb{R}^{k\times k}, V \in \mathbb{R}^{n\times k}\), where \(k = \text{rank } (A)\)
Computing SVD: Two Strategies
- Build from eigenvectors of \(A^TA\) and \(AA^T\)
- Build from eigenvectors of \(\begin{bmatrix} 0 & A^T \\ A & 0 \end{bmatrix}\)
\(\exists\) more specialized methods!
Solving Linear Systems with \(A=U\Sigma V^T\)
\[ A\mathbf{x} = \mathbf{b}\\ \Rightarrow U\Sigma V^T \mathbf{x} = \mathbf{b}\\ \Rightarrow \mathbf{x} = V\Sigma^{-1} U^T \mathbf{b} \]
Uniting Short/Tall Matrices
\[ \min_{\mathbf{x}} ||A\mathbf{x} - \mathbf{b}||_2\rightarrow A^TA\mathbf{x} = A^T\mathbf{b}\\ \]
该约束可以转换为:
\[ \textrm{minimize } ||\mathbf{x}||_2^2 \quad \text{subject to } A^T A \mathbf{x} = A^T \mathbf{b} \]
原因是原问题有三种情况: 1. \(A\)可逆(completely determined),有唯一解 2. \(A\)超定(overdetermined),有最小二乘解 3. \(A\)欠定(underdetermined),有无穷多解
任何情况都满足\(A^TA\mathbf{x} = A^T\mathbf{b}\)这个约束。
\[ A=U\Sigma V^T \Rightarrow A^TA \mathbf{x} = V\Sigma^T U^T U\Sigma V^T \mathbf{x} = V\Sigma^T \Sigma V^T \mathbf{x},\\ A^T \mathbf{b} = V\Sigma^T U^T \mathbf{b}\\ \therefore A\mathbf{x} = \mathbf{b} \Rightarrow \Sigma^T \Sigma V^T \mathbf{y} = \mathbf{d}, \mathbf{y}:= V^T \mathbf{x}, \mathbf{d}:= U^T \mathbf{b}\\ \]
可以转化为新的问题:
\[ \min ||\mathbf{y}||_2^2 \quad \text{s.t. } \Sigma^T \Sigma \mathbf{y} = \mathbf{d}\\ \rightarrow \sigma_i^2 y_i = d_i \quad \text{for } i = 1, \dots, k \Rightarrow y_i = \begin{cases} \frac{d_i}{\sigma_i} & \sigma_i \neq 0 \\ \mathbf{0} & \sigma_i = 0 \end{cases}\\ \Rightarrow \mathbf{y} = \Sigma^{+} \mathbf{d}, \Sigma^{+} = \text{diag}(\frac{1}{\sigma_i} \text{if } \sigma_i \neq 0, 0 \text{ otherwise})\\ \Rightarrow \mathbf{x} =V \mathbf{y} = V\Sigma^{+} U^T \mathbf{b} \]
Pseudoinverse
\[ A^+ = V\Sigma^{+} U^T \]
Properties
- \(A\) square and invertible \(\Rightarrow A^+ = A^{-1}\)
- \(A\) overdetermined \(\Rightarrow A^+ \mathbf{b}\) gives least-squares solution to \(A\mathbf{x} \approx \mathbf{b}\)
- \(A\) underdetermined \(\Rightarrow A^+ \mathbf{b}\) gives least-squares solution to \(A\mathbf{x} \approx \mathbf{b}\) with least (Euclidean) norm
Alternative Form
\[ A=U\Sigma V^T \Rightarrow A = \sum_{i=1}^l \sigma_i \mathbf{u}_i \mathbf{v}_i^T\\ l:= \min\{m, n\}\\ A\mathbf{x} = \sum_{i=1}^l \sigma_i \mathbf{u}_i (\mathbf{v}_i^T \mathbf{x}) \approx \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T \mathbf{x} \]
Outer Product
\[ \mathbf{u} \otimes \mathbf{v} := \mathbf{u} \mathbf{v}^T \]
Computing \(A\mathbf{x}\):
\[ A\mathbf{x} = \sum_{i=1} \sigma_i (\mathbf{v}_i \cdot \mathbf{x}) \mathbf{u}_i \]
Trick: Ignore small \(\sigma_i\).
\[ A^+ = \sum_{\sigma_i \neq 0} \frac{\mathbf{v}_i \mathbf{u}_i^T}{\sigma_i} \]
Trick: Ignore large \(\sigma_i\).
Eckart-Young Theorem
Suppose \(\tilde{A}\) is obtained from \(A=U\Sigma V^T\) by truncating all but the \(k\) largest singular values \(\sigma_i\) of \(A\) to zero. Then, \(\tilde{A}\) minimizes both \(||A-\tilde{A}||_{\text{Fro}}\) and \(||A-\tilde{A}||_2\) subject to the constant that the column space of \(\tilde{A}\) has at most dimension \(k\).
Proof:
\[ A=U\Sigma V^T, \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_l \]
Define \(A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T, k \leq l\).
\[ \begin{aligned} ||A-A_k||_2 &= ||\sum_{i=k+1}^l \sigma_i \mathbf{u}_i \mathbf{v}_i^T||_2 = ||M||_2 = \sigma_{k+1} \end{aligned} \]
The last equality is because:
\[ \mathbf{x} = \mathbf{v}_{k+1} \Rightarrow M\mathbf{x} = \sum_{i=k+1}^l \sigma_i \mathbf{u}_i (\mathbf{v}_i^T \mathbf{v}_{k+1}) = \sigma_{k+1} \mathbf{u}_{k+1} \Rightarrow ||M||_2 = \sigma_{k+1} \]
Suppose \(B = XY^T\) (B is rank \(k\)), where \(X \in \mathbb{R}^{m\times k}, Y \in \mathbb{R}^{n\times k}\). Define \(V_{k+1} = \begin{bmatrix} \mathbf{v}_1 & \dots & \mathbf{v}_{k+1} \end{bmatrix} \in \mathbb{R}^{n\times (k+1)}\). We know \(Y^T V_{k+1}\) has a nullspace \(\Rightarrow \exists \mathbf{z} \neq \mathbf{0}\) s.t. \(Y^T V_{k+1} \mathbf{z} = \mathbf{0}\). Scale \(\mathbf{z}\) s.t. \(||\mathbf{z}||_2 = 1\).
\[ \begin{aligned} ||A-B||_2^2 &\geq \max_{||\mathbf{q}||_2 = 1} ||(A-B)\mathbf{q}||_2^2 \\ &\geq ||(A-B)V_{k+1} \mathbf{z}||_2^2 = ||(U\Sigma V^T - XY^T)V_{k+1} \mathbf{z}||_2^2 \\ &= ||U\Sigma V^T V_{k+1} \mathbf{z} - XY^T V_{k+1} \mathbf{z}||_2^2 \\ &= ||U\Sigma V^T V_{k+1} \mathbf{z}||\\ &= \sum_{i=1}^{k+1} z_i^2 \sigma_i^2 \geq \sigma_{k+1}^2 = ||A-A_k||_2^2 \end{aligned} \]
Matrix Norm Expressions
$$ ||A||_{}^2 = _i^2\ ||A||_2 = {_i}\ (A) =